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Abstract 

We consider the bit-probe complexity of the set membership problem, where a set S of 
size at most n from a universe of size m is to be represented as a short bit vector in order to 
answer membership queries of the form “Is x in 5?” by adaptively probing the bit vector at 
t places. Let s(m, n, t) be the minimum number of bits of storage needed for such a scheme. 
Several recent works investigate s{m,n,t) for various ranges of the parameter; we obtain 
the following improvements over the bounds shown by Buhrman, Miltersen, Radhakrishnan, 
and Srinivasan [S] and Alon and Feige [2]. 

For two probes (t = 2): 

(a) s(to, n, 2) = 0{m^~ ); this improves on a result of Alon and Feige that states that 

for n < IgTO, s(m, n, 2) = Olrnn lg((lg m)/n)/\gm). 

(b) s{m,n,2) = L"/u ); in particular, s{m,n,2) = il{m) for n > Igm, that is, if 

s{m,n,2) = o{m) (significantly better than the characteristic vector representation), 
then n = o(lg m). 


For three probes (t = 3): s(to, n, 3) = 0{y mnlg ^). This improves a result of Alon 
and Feige that states that s{m,n,2) = 0{mini). 

In general: 

(a) (Non-adaptive schemes) For odd < > 5, there is a non-adaptive scheme using 0{tm'^ 

Ig ^) bits of space. This improves on a result of Buhrman et al. [S] that states 

4 

that for odd t > 5, there exists a non-adaptive scheme that uses 0{tm'^n) bits of 
space. 

(b) (Adaptive schemes) For odd t > 3 and t < ^ Iglgm and for n < (e > 0), we have 
s(m^n,t) = 0(exp(e^*)TO'^n^~‘+T Igm). Previously, for t > 5, no adaptive scheme 
was known that was more efficient than the non-adaptive scheme due to Buhrman et 

fn 4 

al. inj, which uses 0(tm~n) bits of space. 

(c) If t > 3 and 4* < n, then s(m,n,t) > —For n < Igm, this improves 
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on the lower bound s(m, n,3) = n(y^ mn/ Ig m) (valid only for n > 16 Igm and for 
non-adaptive schemes) due to Alon and Feige; for small values of n, it also improves 
on the lower bound s(m, n,t) = Vl{tmt n^~t) due to Buhrman et al. [S]. 

Key words: Data structures. Bit-probe model. Compression, Bloom filters. Graphs of large 
girth. Expansion. 
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1 Introduction 


We study the static set membership problem: given a subset S of [m] represent it in memory 
so that membership queries can be answered using a small number of bit probes (we assume 
random access is allowed into the memory). Standard solutions to the set membership problem 
can be examined in this light. (We use Ig to mean logarithm to the base two.) 

The characteristic vector: Sets can be represented as a bit-string of length m, and member¬ 
ship queries are answered using a single bit probe. However, this representation in not 
sensitive to the number of elements in the set, which can be much smaller than m. 

The sorted table: Suppose the set S has n elements. Using the standard representation of 
elements of the universe in Igm bits, we may store S in memory as a sorted table of 
nlgm bits. Queries can then be answered using binary search taking about (lgm)(lgn) 
bit probes in the worst case. 

The static membership problem in the bit probe model (in contrast to the more common cell- 
probe model) was already studied (in the average case) by Minsky and Papert in their 1969 
book Perceptrons HH. More recently, the worst-case space-time trade-off for this problem was 
considered by Buhrman, Miltersen, Radhakrishnan and Venkatesh [5] and in several subsequent 
works [a 0 da na E]. The set membership problem for sets where each element is included 
with probability p was considered by Makhdoumi, Huang, Medard and Polyanskiy [1]; they 
showed, in particular, that no savings over the characteristic vector can be obtained in this case 
for non-adaptive schemes with t = 2. 

To describe the previous results and our contributions formally, we will use the following 
definitions. 

Definition 1.1. An {m,n, s)-storing scheme is a method for representing a subset of size at 
most n of a universe of size m as an s-bit string. Formally, an (m, n, s)-storing scheme is a 
map (p from (^) to {0,1}®. A deterministic {m, s,t)-query scheme is a family {Tu\u&[m\ of ^ 
Boolean decision trees of depth at most t. Each internal node in a decision tree is marked with 
an index between 1 and s, indicating the address of a bit in an s-bit data structure. For each 
internal node, there is one outgoing edge labeled “0” and one labeled “1”. The leaf nodes of every 
tree are marked ‘Yes’ or ‘No’. Such a tree induces a map from {0,1}* to {Yes, No}; this 
map will also be referred to as T^. An (m, n, s)-storing scheme (j) and an (m, s, t)-query scheme 
{Tu}u£[m] together form an {m,n, s,t)-scheme if VS* G (1^)) Vtt G [m] : Tu{(t>{S)) =Yes if and 
only if tt G S'. Let s(m, n, t) be the minimum s such that there is an (m, n, s, t)-schem4{]- 

We say that an (m, n, s, t)-scheme is systematic if the value returned by each of its trees 
is equal to the last bit it reads (interpreting 0 as No/False and 1 as Yes/True). 

Remark 1.2. Note that this definition describes a non-uniform model and ignores the important 
issue of uniformly representing the decision trees in the query algorithm. Furthermore, disre¬ 
garding the fact that in practice memory is organized in words, it instead focuses attention on the 
fundamental trade-off between the compactness of information representation and the efficiency 
of information extraction in the context of the set membership problem. The upper bounds 
derived in this model are not always realistic (they sometimes rely on probabilistic existence 
arguments); however, lower bounds derived here are generally applicable. 

The main focus of Buhrman et al. was the randomized version of the above schemes; they 
showed that membership queries can be answered correctly with probability 1 — e by making 
just one bit probe into a representation of size O(^lgm) bits. They also showed the following 
lower and upper bounds for deterministic schemes,: (i) s{m,n,t) = ^l{tm'in^~t') valid when 

^In the literature this function is often written as s{n,m,t)-, we list the parameters in alphabetical order. 
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n < (for e > 0 and t <C Igm) and (ii) s{m,n,t) = 0(m‘+in) for odd t > 5. However, 

Buhrman et al. left open the question of whether a scheme better than the characteristic vector 
was possible for t = 2, 3, 4, and n large. Alon and Feige [2], in their paper, “On the power of 
two, three and four probes,” addressed this shortcoming. Our contributions are closely related 
to theirs. 

For two probes, Alon and Feige [2] show the following. 

Theorem 1.3. For n < Igm, s{m,n,2) = O ^mnIg 
Thus, s{m,n,2) = o{m), whenever n = o(lgm). 

They state: 



There are still rather substantial gaps between the upper and lower bounds for 
the minimum required space in most cases considered here; it will be nice to get 
tighter estimates. In particular, it will be interesting to decide if there are adaptive 
(m, n, s, 2)-schemes with s < m, for n > y/ml2, and to identify the behavior of the 
largest n = n(m) so that there are adaptive (m, n, s, 2)-schemes with s = o{m). 

In this paper, we address this by showing the following. (We assume m is large; all asymptotic 
claims made below hold for large m.) 

Theorem 1.4 (Result 1). (a) There is a constant C > 0, such that for all large m, s{m, n, 2) < 

-I 1 

C ■ m 4n+l , 

2 1 

(b) Let 4 < n. There is constant D > 0, such that for all large m, s{m,n,2) > Dm . 

_ 2 1 

For three probes, Alon and Feige |2 show that s(m,n,3) = O^m^ns). Their query scheme 
is adaptive and based on random graphs. We show the following. 

Theorem 1.5 (Result 2). s{m,n,3) = 0{^Jmn Ig ^). 

This scheme is adaptive. For small values of n, this result comes close to the lower bound 
shown below in Theorem 11.81 We further generalize this construction for large values of t. 

Theorem 1.6 (Result 3, non-adaptive schemes). For odd t > 5, there is a non-adaptive scheme 
using 0{tm^-^n ‘“^Ig^) bits of space. 

This improves on a result of Buhrman et al. [5] that states that for odd t > 5 and n < m^~^, 
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there exists a non-adaptive scheme that uses 0{tm^^+^n) bits of space. These schemes, as well as 
the non-adaptive scheme for t = A due to Alon and Feige [2], have implications for the problem 
studied by Makhdoumi et al. [4]; unlike in the case of t = 2, siginficant savings are possible if 
t> A, even with non-adaptive scheme^. 

Theorem 1.7 (Result 4, adaptive schemes). For odd t > 3 and t < Iglg m and for n < m}~'" 
(e > Oj, we have s{m,n,t) = 0(exp(e^®)m*+in *+i Igm). 


We observe that the two-probe lower bound shown above can be used to derive slightly better 
lower bounds for t > 3. 

1 1 Q 4* \ 

Theorem 1.8 (Result 5). 7/4* < n, then s{m,n,t) > ‘ 

In particular, for t = 3 and n ~ Igm, this gives an Ll{y/m) bound, whereas the previous best 
—— 2 1 

bound [5] was of the form Ll{tn3mi). 

^We are grateful to Tom Courtade and Ashwin Pananjady for this observation. 
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What is new, what is old: As stated before, this work is closely related to the paper of 
of Alon and Feige [2]. For two probes, they explicitly modeled their problem using graphs, 
and translated the high girth of the graphs to their expansion. This allowed them to use 
Hall’s matching theorem to avoid conflict while allocating memory locations to elements of the 
universe. We borrow the idea of using graphs of high-girth but we do not reduce the allocation to 
a matching theorem. Instead, we observe that the constraints in this case can be written down 
as a 2-SAT expression. Furthermore, if the graph has high girth then this 2-SAT expression 
must be satisfiable and we will be able to represent our set successfully. Working with 2-SAT 
instead of the matching problem allows us to show a stronger upper bound. For the lower 
bound we turn the argument on its head: we show roughly that any valid two-probe scheme 
must conceal a certain dense graph that avoids small cycles. Standard graph theoretic results 
(the Moore bound) that relate density and girth then deliver us the lower bound. We believe this 
approach via 2-SAT offers a better understanding of the connection between two-probe schemes 
and graphs of high girth. 

Our three-probe scheme fTheorem 11.51) is based on the following idea. We must ensure that 
the data structure returns the answer ‘Yes’ for all query elements in S and ‘No’ for all elements 
not in R (in the end we would want R = [m] \S). If i? is small, then this can be arranged using 
Hall’s theorem, by slightly extending the argument used by Alon and Feige [2] for their three- 
probe scheme. But we still need take care of large R. We notice that the last two probes of a 
three-probe scheme induce two-probe schemes (precisely how this comes about is not important 
here). We will show that whenever R is large, there is always an element in it that cannot appear 
in a short cycle in these two-probe schemes. That is, we may peel this element away, work on 
the rest, and then make appropriate adjustments to accommodate this element. A form of this 
argument has been used in the randomized schemes of Buhrman et al. [5]; it appears in in the 
literature in other contexts, such as Invertible Bloom Lookup Tables [7] and graph based LDPC 
codes [8]. Our scheme is not explicit, for it relies on random graphs that are suitable for the 
peeling and matching arguments we employ. 

We generalize the above arguments to more than four probes by considering appropriate 
random query schemes, and identifying properties of the resulting random graph that allow us 
to find the necessary assignment to correctly represent each possible set. 

1.1 Other related work 

Some recent work on the bit probe complexity of the set membership problem has focused on 
sets of small size. The simplest case for which tight bounds are not known is n = 2 and t = 2: 
an explicit scheme showing s(m,2,2) = was obtained by Radhakrishnan, Raman and 

Rao |12| . Radhakrishnan, Shah and Shannigrahi |13) showed that s(m, 2,2) = H(m^/'^). They 
also considered the complexity s{m,n,t) for n small as t becomes large. These latter results 
were significantly improved by Lewenstein, Munro, Nicholson and Raman [9], who, in particular, 
gave a interesting explicit adaptive schemes showing that for t > 3 we have 

s{m,2,t) < (2* — \ 

Thus, the exponent of m in their bound for n = 2 is at most (1 -|- in contrast, the lower 

bound of Theorem 11.81 shows that the exponent is at least -j^ > (1 -|- j)j when the set size 
is much bigger than 4*lgm. Furthermore, for n > 2, they obtain explicit schemes showing 
s{m,n,t) = 0(2*mi/(*-““2LignJ,n-3/2))_ 


2 Two-probe upper bound: Proof of Theorem 11.41 (a) 

We assume that n < ^ Igm, for otherwise, the claim follows from the trivial bound s{m, n, 2) < 
m (taking C large enough). 
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Our upper bound is based on dense graphs of high girth. The connection between graphs 
of high girth and two-probe schemes was first noticed by Alon and Feige. They used graphs as 
templates for their query schemes, and reduced the existence of a corresponding storing scheme 
to the existence of matchings. Exploiting the expansion properties of small sets in graphs of large 
girth, they then showed that the necessary matchings do exist. Our query scheme is essentially 
the same as theirs. However, we sharpen their analysis and observe that the storing problem 
reduces to a 2-SAT instance. The underlying graph’s high girth this time implies that the 2-SAT 
instance has the necessary satisfying assignment. 

Definition 2.1 (Query graph). An (m, s)-query graph is a graph G with three sets of vertices A, 
Aq and Ai, each with s vertices. Each vertex u G A has even degree. With each element x G [m] 
we associate a triple (z(x), fo(x), zi(x)) G A x Aq x Ai such that {z(x), fo(x)}, {f(x), zi (x)} G 
E{G). We label both these edges with x, and require that no edge receive more than one label. 

An (m, s)-query graph immediately gives rise to a systematic query scheme. The scheme uses 
three arrays A, Aq and Ai each containing s bits. The query tree Tx processes the query “Is x in 
S'?” as follows: if A[i{x)] then Ai[ii(x)] else Ao[fo(33)]- We use Tg to refer to this query scheme. 
We say that the query scheme Tg is satishable for a set S C [m], if there is an assignment to 
the arrays A, Aq and Ai such that all queries of the form “Is x in S?” are answered correctly 
by Tg- 

Proposition 2.2. If there is a (m, s)-query graph such that the query scheme Tg is satisfiable 
for all sets S C [m] of size at most n, then s{m, n, 2) < 3s. 

Our claim will thus follow immediately if we establish the following two lemmas. 


Lemma 2.3. Let G be an [m, s)-query graph and S C [m]. 7/girth(G) > 4|S|, then Tg is 
satisfiable for S. 

. , 1 

Lemma 2.4. There is an ^^+^))-query graph with girth more than 4n. 


Proof of Lemma \2.3[ Fix a non-empty set S of size at most n. We need to assign values to the 
bits of A, Aq and Ai so that all queries are answered correctly. Note that since our query scheme 
is systematic, the only constraints we have are the following. 

X G S': 


-'A[f(x)] —>• 
A[i(x)] —>• 

y^S: 

-^A[i{y)] 

A[i{y)] 


Ao[zo(x)]; (2.1) 

Ai[ii(x)]. (2.2) 

-^o[io(y)]; (2.3) 

^Ai[ii(y)]. (2.4) 


Let us examine the implications of the above constraints for the variables from the first array: 
A[l], A[2],... , A[s]. From (|2.ip and (|2.3p . we conclude that whenever x G S and y 0 S and an 
edge with label x and an edge with label y meet in Aq, we have the constraint 

A[i{x)]V A[i{y)]. (2.5) 

Similarly, from (12.2p and (12.4p . if x G S' and y ^ S, and an edge with label x and edge with label 
y meet in Ai, we have the constraint 


^A[i(x)] V ^A[i{y)]. 


( 2 . 6 ) 
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Let ipsiA) be the 2-SAT instance on variables A[l],..., A[s] consisting of all clauses of the form 
([23]) and fl2.6D . It can be verified that a satisfying assignment for ipsiA) can be extended to the 
other arrays, Aq and Ai, in order to satisfy all constraints in (I2.ip - (|2.4p . So, it suffices to show 
that ipsiA) is satisfiable. 

Each clause of the form x V y is equivalent to the -ix —>■ y and -ly ^ x. Furthermore, if 
ipsiA) is not satisfiable, then there must be a chain of such implications from a literal to its 
negation (see, e.g., Aspvall, Plass and Tarjan ID). We now observe that since our graph has 
large girth, such a chain cannot exist. Suppose the shortest such chain has the form 

A[io\ —>■ ——7> A[z2] ^ A[z£_i] —>■ —iA[z£], 

where ii = zq and otherwise the Zj’s are distinct (if they were not distinct, there would be 
a shorter chain). Since each clause of V’S involves at least one element from S, we have i < 
2151. The first implication corresponds to a path of length two in G from A[zo] to A[ii] via an 
intermediate vertex in Ai, the second to a path of length two in G from A[ii] to A[i 2 ] via Aq, 
and so on; the last implication corresponds to a path of length two from A[z£_i] to A[i(\ via 
Ai. If = 0, we have a path in G from A[zo] to itself via Ai (consisting of two different edges, 
one with label in S and the other with label not in 5), resulting in a cycle of length two—a 
contradiction, li ^> 1, the first implication shows that there is a path of length two in G from 
A[zo] to A[ii] via Ai. The remaining implications show that there is a walk of length 2{l!. — 1) 
from A\ii] to A[zo] that starts with an edge from A[ii] to Aq. Thus, A[ii] is in a cycle in G of 
length at most 2^ < 4|5|—a contradiction. 

A similar argument shows that the shortest such chain cannot be of the form -'A[zo] ^ 

A[ii\ —)• -'A[z 2] ^ A[io]. □ 

Proof of Lemma \2.4\ Let G = {Vi, V 2 , E) be a bipartite graph of girth y, with |Vil, IV 2 I = s, and 
each vertex in Vi of even degree. (Later, we will indicate how such graphs G can be obtained.) 
Let 


El = {E[l],E[2],...,E[s]}; 

E = {E[l],E[2],...,EM}. 

Consider the (|E|/2, s)-query graph ff constructed as follows. H has three vertex sets A, Aq 
and Ai. A will be a copy of Vi, and Aq and Ai will be copies of E- Half the edges of G 
between Vi and E will be placed between A and Aq and the rest between A and Ai. More 
precisely, suppose the neighbors of Vi{i\ are Ebi]) E 2 [y 2 ], ■ ■ ■, E 2 [yrf]. Then, for A: = 1, 2,..., d/2, 
we include edges {A[z], Ao[j 2 fc]} and {A[z], Ai[Efc-i]} in H-, furthermore, these two edges will 
have the same label x G [m]. It is immediate that H is a (|E(G)|/2, s)-query graph with girth 
at least g. Thus, it is enough to exhibit a bipartite graph G with |Ti| = |E| = 0{m 4n+i)^ 

\E{G)\ = 2m, girth(G) > 4n and all vertices in Vi of even degree. We present a probabilistic 
argument (essentially due to Erdos) to establish the existence of such graphs. 


Dense graphs of large girth: A probabilistic argument (due to Erdos) establishes the exis¬ 
tence of such graphs. Let fe = 4n < ^ Igm, and consider the following random bipartite graph 


G on vertex sets Vi and V 2 , each with s = 


4m 


1- 


k + l 


number at most sk ■ thus > d > — 2 > 


vertices each. Let d be the largest even 


2. For each vertex x G Ei, we assign d distinct 
neighbors from E- Then, the expected number of short cycles in G is at most 





< 


^ 1 < 1 < ^ 


£=0 


d^ 
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Thus, there is such a graph with at most | short cycles. Consider each such cycle one by one, 
and for each pick one of its vertices in Vi and delete both edges from the cycle incident on it. 
Then, the average degree in Vi is at least d — 1 > — 3 > and |ii'(G)| > > 2m. 


Using this in the above construction we obtain s{m,n,2) < 


Am 


1 - 


4n+l 


□ 


Explicit schemes: Two-probe schemes can be obtained more explicitly using the following 
construction of graphs of large girth. 

Proposition 2.5 (see Proposition 2.1 of F. LAZEBNIK, V. A. USTIMENKO, AND A. J. 
WOLDAR [10)1. Let q be a prime power and k > 1 be an odd integer. Then, there is graph 
D{k, q) that is 

(i) q-regular and of order 2q^. 

(a) D{k, q) has girth at least k + 5 

Now assume n < ;|(logm)^/^. Set fc = 4re — 3, and choose q = 2"^ such that < 

q < 2(2m)^/^^'’“^^. Then, D(k,q) is a bipartite graph on vertex sets (Ui, V 2 )) with girth at least 
4n -|- 2, at least 2m edges and 

|Ui| = IU 2 I =s = q^ < (2(2m))i/(^+i))^ < 

lin < |(logm)^/^, we have {k -|- l)^(fc + 2) < logm, and we have 


3 Two-probe lower bound: Proof of Theorem 11.41 (b) 

Since s{m, n, 2) is an non-decreasing function of n, it is enough to establish the claim for n < Ig m. 

Proposition 3.1. If there is an (m,n, s,t)-scheme, then there is a systematic {m,n,2s,t)- 
scheme. 

So, from now on, we will assume that our schemes are systematic. 

Definition 3.2. (Bipartite graph pseudo-graph G$) Fix a systematic (m, n, s, 2)-scheme $. 
We will associate the following bipartite graph with such a scheme. There will be two sets 
of vertices, each with s elements: Aq and Ai. Each edge of will have a color and a label. 
We include the edge {Ao[j], Ai[A:]} with label x and color i, if on query “Is x in S'?” the first 
probe is made to location i, and if it returns 0, the second probe is made to location j and if 
the hrst probe returns a 1, the second probe is made to location k. thus has 2s vertices and 
m edges, which are colored using s colors. 

The pseudo-graph is a bipartite graph obtained from as follows. and have 
the same set of vertices. The edges of are obtained as follows. Consider the edges of color a 
in H<^. We partition these edges into ordered pairs (excluding one edge if the number of edges 
of this color is odd). For each such pair we include a pseudo-edge in as follows. Let (e, e') be 
one such pair; suppose e = {u, u} has label x, e! = {u', v''\ has label x', u, u' G Aq and v, v' ^ Ai. 
Then, in we include the edge {u,v'} with label {(u,x), {v',x')'\ (we do not include the edge 
{u',v}). We repeat this for all colors a. Thus is a bipartite graph with at least (m — s)/2 
edges. For a set of edges P of let lab(P) C [m] be the set of elements of the universe that 
appear in the label of some edge in P. 
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3.1 Forcing 

Lemma 3.3 (Forcing lemma). Let be a pseudo-graph associated with a systematic scheme 
<5. Let C be a cycle in starting at vertex v. Let b G {0,1}. Then there are disjoint subsets 
So^Si C lab(C'), each with at most 16*1 +1 elements, such that in any representation under ^ of a 
set S such that Si C S' C So, location v must be assigned b. Further, ifb = 0 then |Si| = \G\ — 1, 
and if b = 1, then |Si| = \C\ + 1. 

Proof. First, consider the claim with 6=1. Suppose the cycle is 


V = Vo 


el. 


Vl 


62. 


Sk-l 


Vk-1 




Vk = Vo, 


and the pseudo-edge has label {(uj-i, Oj), (uj, 6j)}. Let Si = {ai, 02 , • • •, Ofc_i, a^, 6^} (it has 
k 1 elements) and Sq = {61,62,, 6fc_i}. We claim that if the scheme represents a set S 
such that 

Si c S c So, 

then location v must be assigned 1. Suppose location v is assigned 0. Recall that the scheme 
is systematic. Since oi G S and 61 0 S, we conclude from the definition of the pseudo-edge 
ei that Vl must also be assigned 0. Using the subsequent edges in the cycle, we conclude that 
the locations vo, ■ ■ ■ ,Vk-i must all be assigned 0. Now, however, ak,hk G S', so 1 must be 
assigned to location v^ = v —a contradiction, for we assumed that v was assigned 0. This proves 
our claim for 6 = 1. For 6 = 0, we take Si = { 61 , 62 ,... , 6 fc_i} (it has k — 1 elements) and 
So = {oi, • • •, Ofc-i) flfc) bk}, and reason as before. □ 

Corollary 3.4. If a scheme stores sets of size up to 2k, then the pseudo-graph associated with 
the scheme cannot have two edge-disjoint cycles of length at most k each that have a vertex in 
common. 


Proof. Suppose there are two such edge-disjoint cycles, Ci and C 2 , both starting at v. We apply 
Lemma 13.31 with 6 = 0 and obtain sets S'o,S'i C lab(C'i). Next we apply Lemma 13.31 with 6=1 
and obtain sets To,Ti C lab(C' 2 ). Now, consider S' = S'! U Ti, a set of size at most 2k. When 
the scheme stores the set S, then location v must be assigned a 0 (because S'! C S' C S'o), and 
also 1 (because Ti C S' C Tq)— a contradiction. □ 


3.2 Calculation 

Our lower bound will use Corollary 13.41 as follows. We will show that if a scheme uses small 
space to represent sets of size up to n from a universe of size m, then its pseudo-graph must be 
dense (for it must accommodate about m/2 edges). In such a dense graph there must be short 
cycles. In fact, if m S> then we can ensure that there are two cycles of length at most 

n/2 each that have a vertex in common. But, then Corollary 13.41 states that such a scheme does 
not exist. 

To make the above argument precise, we will use the following proposition, which is a conse¬ 
quence of a theorem of Alon, Hoory and Linial |3] (see also Ajesh Babu and Radhakrishnan | 6 ]). 

Proposition 3.5. Let G be a bipartite graph average degree d > 2, and girth greater than 

1 

(for positive integer n> 4). Then, d < (|U(G)|/2) L^/^J -|- 1 . 

Proof. Let n = 4p -|- g, for p = and q such that q G {0,1, 2,3}. From our assumption, the 
girth of G is greater than > 2p. Since G is bipartite, G has girth at least 2{p-\- 1 ). The result 
of Alon, Hoory and Linial then immediately implies that |U(G)| > 2{d — 1)^. Since p = [n/4j, 
our claim follows from this. □ 
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Corollary 3.6. Suppose G is a bipartite graph with\V {G)\ > |_^J and\E{G)\ > {\V{G)\/2) L"/4J _)_ 

„ 1 

||y(G)| and {\V{G)\/2)'\rTE > 2. Then, G has two edge disjoint eycles of length at most ^ that 
have at least one vertex in common. 

Proof. If|i?(G)| > (|C(G)|/2)^'''then the average degree is at least (|C(G)|/2) + 

1 > 2. By the proposition above, G has a cycle of length G < . Remove this cycle, and 

consider the remaining graph, which has more than (2s) L"/4J + (3|R(G)|/2 — ^i) edges. Again, 

we find a cycle of length at most £2 < [f J • We may continue in this way, finding cycles of length 
£i (i = 1,2,...), until the sum of the ifs exeeeds |R(G)|. At that point, two of the cycles we 
found must intersect (for there are only |R(G)| vertices). □ 

Proof, of Theorem 11.41 fbi Fix an (m, n, s, 2)-scheme. Assume m is large. Using Proposition 13.11 
we obtain a systematic (m, n, 2 s, 2 )-scheme, say <b, and consider the corresponding pseudo-graph 
G$ (note |U(G$)| = 4s and |iii(G)| > (m —s)/2). The lower bound of Buhrman et al. [5] implies 
that s > yTu; thus (2s) > 2; also since we assume n < Igm, we have 4s > |_^J. By 

applying Corollary 13.61 and Corollary 13.41 to G#, we conclude that 

< |^(G'$)| < ( 2 s)^+uW + 6 s. 

1 

Thus (recall from above that (2s) > 2), 

m<13s + 2(2s)^’^F^kr < (Ts)^"^!^ + (ds)^"^!^ < (lls)^+T^. 

By raising both sides to the power 1 — and rearranging the inequality, we obtain s > 

L"/4J . □ 

4 Three-probe upper bound: Proof of Theorem 11.5 

As in the case of two probes, our three-probe scheme will be based on the existence of certain 
graphs. The framework we use will be general and also applicable to schemes that make t > A 
probes. We will present the general framework first, and specialize it to t = 3 when we describe 
our proof. 

Definition 4.1. An (m, s, f)-graph is a bipartite graph G with vertex sets U = [m] and V 
(|U| = (2* — l)s). V is partitioned into 2* — 1 disjoint sets: A, Aq, Ai, Ago,..., one Aq- for each 
cr G {0,each A^ has s vertices. Between each u & U and each A„ there is exactly one 
edge. For i = 1,... ,f, the subgraph of G induced by U and V) = Uo-.|o-|=i_iACT will be referred 
to as Gj. An (m, s, f)-graph naturally gives rise to a systematic (m, ( 2 * — l)s)-query scheme Tg 
as follows. We view the memory (an array L of (2* — l)s bits) as being indexed by vertices in 
V. For query element u € U, if the first i — 1 probes resulted in values cr € {0,then the 
Tth probe is made to the location indexed by the unique neighbor of u in A^. In particular, the 
Tth probe is made at a location in Vj- 

We say that the query scheme Tg is satisfiable for a set S C [m], if there is an assignment 
to the memory locations (L[u] : v G V), such that Tg correctly answers all queries of the form 
“Is X in ST. 

We now restrict attention to f = 3 probes. First, we identify an appropriate property of the 
underlying (m, s, 3)-graph G that guarantees that the Tg is satisfiable for all sets S of size at 

most n. We then show that such a graph does exist for some s = 0(y^mnlg”^). 

Definition 4.2 (Admissible graph). We say that an (m, s, 3)-graph G is admissible for sets of 
size at most n, if 





















(PI) Vi? C [m] (|ii| < n + [nlg^]): |rG'(i?)| > 5|ii|, where rG(i?) is the set of neighbors of 
R in G. 

(P2) VSC [m]{\S\ <n),Vi?C [m]\S(|i?| > [n Ig ^]) 3y E i?: 

(rG3(y)nrG3(5) = 0) OR 

(|rG 3 (y)nrG 3 ( 5 )| = 1 and \rG,uGM^'^G,uG,i{RiJ s)\{y})\ <i) . 


Lemma 4.3. If an (m, s, 3)-graph G is admissible for sets of size at most n, then the {m, 7s, 3)- 
query scheme Tg is satisfiable for [S, [m] \ S) for every S of size at most n. 


Lemma 4.4. There is an (m, s,3)-graph with s 
set S C [m] of size at most n. 


500 a/ mn Ig 


2m 

n 


that is admissible for every 


Proof of Lemma \4.3\ Fix an {m, s, 3)-graph G that is admissible for sets of size at most n. Thus, 
G satisfies (PI) and (P 2 ) above. Fix a set S of size at most n. Suppose Tg is not satisfiable for 
S. Then, there is a minimal set T C [m] \ S such that Tg fails to correctly answer queries for 
all u € S' U T under every assignment. We have two cases. 

|5 U T| < n + [nig : We use an idea from Alon and Feige [2]. From (PI) and Hall’s theorem, 
we may assign to each element n E S U T a set Vu C FG(n) such that (i) \Vu\ =5 and (ii) 
the Yu's are disjoint. It can be verified that in a binary decision tree of depth 3 and any 
value b E {0,1}, given any set of FIVE nodes, values can be assigned to those nodes to 
ensure that the tree returns the value b. Thus, there is an assignment (fixing five bits for 
each n E 5 U T) so that Tg returns the correct answer for all u E 5 U T —contradicting our 
choice of T. 


S'U T| > n + [(nig ^)]: Thus, T > [nig . From property (P2), we conclude that there is 
a 7 / E T such that one of the following holds. 

(a) rG 3 (y) nrG3(5) = 0 or 

(b) |rG3(y)nrG3(5)| = 1 and |rG 3 UG 2 (y) nrG3UG2((3^u .s) \ {y})| < i. 

By the minimality of T, there is an assignment cj E {0,1}^ so that Tg correctly answers 
queries for all elements u E SDT\{y}. In case (a), modify cr so that all locations in TG^iy) 
(the locations that are probed in the third step by Ty) are 0. For the new assignment a', 
the query for y is clearly answered correctly; the operation of T for u E 5 U T \ {y} is 
identical in a and a'. This again contradicts the choice of T. 

In case (b), we again start with an assignment a E {0, so that Tg correctly answers 
queries for all elements u E S DT\{y}. Now, to accommodate y, we will modify a to a', 
by making changes to locations in lA, V 2 and V 3 . We have exactly one £ E V 3 such that 
i E rG 3 ( 7 /)nFG 3 ( 5 '); in a' all locations in rG 3 ( 7 /) other than i are set to 0. Furthermore, at 
least two of the three locations in rGiuG 2 (v) outside FGiuG 2 ((‘S'ur) \{y}); so we may 
modify them without affecting the operation of any decision tree T for u E {SUT)\{y}. In 
a', we assign these values appropriately so that the third probe of Ty is not 1. We have thus 
ensured that under assignment a' , queries for all n E S' U T are answered correctly—again 
contradicting the choice of T. 

□ 


Proof of Lemma \4.4\ We show that a suitable random (m, s, 3)-graph G is admissible with pos¬ 
itive probability, for s = 500ymnlg^ . The graph G is constructed as follows. Recall that 
V = For each u £ U, one neighbor is chosen uniformly and independently from 

each Az- 
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(PI) holds. If (PI) fails, then for some non-empty VP C [/, (|1P| <n + [nlg^]), we have 
|rG(VP)| < 5|VP| — 1. Fix a set W of size r > 1 and L C P of size at most 5r — 1. Let L 
have iz elements in Az- Then, 


Pr[rG(lP) ^ < n 


I A. 


< 


5r — 1 
7s 


7r 


If s > 500y mn Ig then we conclude, using the union bound over choices of W and L, 
that the probability that (PI) fails is at most 


< 


< 


< 


n+lnlg^] 

E ” 

r=l 

n+\n\g^'\ 

E 

r=l 

n+[nlg^] 

E 


7s 

5r- 1 


em\ 
r 


7es 
5r- 1 


5r- 1 
7s 


5r—1 


7r 


5r- 1 
7s 


7r 


r=l 


5r \ f fp'e^mr 


7es J \ 72 s2 
]- (if s > hm^Jmn Ig^). 


(P2) holds. For (P2) to fail, there must be disjoint sets S,R C U, where l^l = n' < n, 
\R\ = r > [nlg^] for which the condition specified in Definition 14.21 does not hold. 
Then, R = Ri U R 2 , where i?i = {y € i? : |rG'3(y) n rG'3(S')| = 1} and R 2 = {y & R : 
|rG3(y) n Pg3 (5')| > 2}; let ri = \Ri\ and r 2 = |i?2|- Furthermore, for y G Ri we have 
irGiUG2(y)nrGiUG2(^U5\{?/})| >2. This implies that |rGiuG2(^U5)| <3(n' + r)-ri. 
Fix Ri C R, R 2 = R \ Ri and define events £i, £2 and as follows. 


= VyGi?i:|rG3(2/)nrG3(5)| = l; 
^2 = Vy€i22:|rG3(2/)nrG3(5)|>2; 

£* = |rGiUG 2 (-RU S')! < 3(n'+ r) - ri. 


By the union bound, the probability that (P2) fails is bounded by the sum of Pr[^i] Pr[£i 2 ] P^i^* 
taken over all valid choices of 5, R, Ri and i? 2 - We have Pr[£ii] < and Pr[£i 2 ] < 

2) ( 7 )^ < bound Pr[f’*], we proceed as we did above for (PI). We have 


Thus, 


Pr[f*] < 


3s \ r 

3(n' + r) — ri) \ J 

3es 


3(n'+r)-n X3(n'+r) 


^3(n' + r) — ri 
< exp(3(n' + r) — n) ^ 


3s 

3{n' + r) — ri ^ 
3s 


Ft[£i] Ft[£2] Pr[f* 

ri /Q^\ 2?'2 


/ 4n \ / 3n \ , , , ■. f 

< ( — ] \~) exp(3(n + r) - ri) ( 


3{n' + r) — ri 


3s 


ri 


< (9ey 


n 


r+r2 


(n -h rY 


^2r 
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Using the union bound, we conclude that 



Pr[(P2) fails] 

< 

IL 

E 

E 


n^=l 

r>fnlg3f 

< 

n 

E 

E 


n'=l 

r>lnlg^ 

< 

n 

E 

E 


n'=l 

r>\nlg^ 

< 

n 

E 

E 


n^=l 

r>\nlg^ 


m\ m 


n' / \ r 


(^Jn^=(n + r) 


ri=0 


ri 


em\^ /em\'^ , 


n' 


r •n} r (9e®) (2n + r) 


rs^ 


3^e®2mn 


< — (for s > 500a/ mnIg ^ and m large). 

3 * 

Thus, with probability at least 4 the random graph G is admissible. 


□ 


5 Lower bound: Proof of Theorem 11.8 

Our theorem follows immediately from the following lemma. 

Lemma 5.1. Suppose t > 2, and s, m and n are sueh that (i) < n < and (ii) 

1 —fi-—1 

s < "h If there is a t-probe scheme on a universe of size m that uses space at most 

s, then there are disjoint sets S and T of size at most n each such that for every assignment to 
the memory, some query in S is answered with a ‘No’ or some query in T is answered with a 
‘Yes’. 

Proof. For t = 2, the claim is established in the proof of Theorem 11.41 (see the last line of the 
proof). We will use induction on t to generalize this claim to larger values of t. Assume the 
claim is true for t = k — 1 and we wish to show that it holds for t = k. Fix m, n and a /c-probe 
scheme that satisfy our assumptions. We now show how the sets S and T are obtained. There is 
a cell to which at least ^ of the elements make their first probe: call this set of elements U'. By 
hxing the value of this cell at 0, we obtain a, [k — l)-probe scheme for the universe U'. We will 
verify that the assumptions needed for induction are satished for this scheme. We conclude by 
induction that there are disjoint sets Sq,Tq C U' each of size at most Let U" = U'\{SqDTq). 
Now, assume that the cell has value 1, and apply induction to the resulting {k — l)-probe scheme 
(for the universe U") to obtain sets 5i and Ti of size at most Our claim for t = k then follows 
immediately by taking 5 = 5*0 U S*! and T = Tq U Ti. 

It remains to verify that the assumptions (i) and (ii) needed for the induction hypothesis in 
fact do hold. Since \U'\ > \U"\, it is enough to verify the conditions for U". Now \U'\ > ^ ^ 

15m> 2n. Thus, m' = \U"\ > \U'\ — n > ^. We need to find sets of size at 
most n' = |_^J. Clearly n' > j = Af~^, so condition (i) holds. Also, since m' > ^ and n' > j, 

we have s < j^m , so condition (ii) holds. □ 


11 















6 General upper bound: non-adaptive (Theorem 11.61) 


Definition 6.1. A non-adaptive (m, s,t)-graph is a bipartite graph G with vertex sets U = [m] 
and V (T| = ts). V is partitioned into t disjoint sets: Vi,... ,Vt] each Vi has s vertices. Every 
u & U has a unique neighbour in each Vi. A non-adaptive (m, s, t)-graph naturally gives rise 
to a non-adaptive (m, ts, t)-query scheme Tg as follows. We view the memory (an array L of 
ts bits) to be indexed by vertices in V. On receiving the query “Is u in S?”, we answer “Yes” 
iff the Majority of the locations in the neighbourhood of u contain a 1. We say that the query 
scheme Tg is satisfiable for a set S C [m], if there is an assignment to the memory locations 
{L\v\ : V ^V), such that Tg correctly answers all queries of the form “Is x in ST\ 

We now restrict attention to odd t > 5. First, we identify an appropriate property of the 
underlying non-adaptive (m, s,t)-graph G that guarantees that Tq is satisfiable for all sets S of 
size at most n. We then show that such a graph exists for some s = 0{m^-^n Ig -^). 

Definition 6.2 (Non-adaptive admissible graph). We say that a non-adaptive (m, s, t)-graph 
G is admissible for sets of size at most n if the following two properties hold: 

(PI) Vi? C [m] (|ii| < n-I- |'2nlg^]): |rG(i?)| > ^|i2|, where Tg{R) is the set of neighbors 
of R in G. 

(P2) V5 C [m] (|5| = n): \Ts\ < [2nlg ^] , where = {y G [m] \ 5 : |rG(y) n rG(5)| > 

Our theorem will follow from the following claims. 

Lemma 6.3. If a non-adaptive {m, s,t)- 9 i"o-ph G is admissible for sets of size at most n, then 
the non-adaptive {m,ts,t)-Q'^^i"y scheme Tg is satisfiable for every set S of size at most n. 

Lemma 6.4. There is a non-adaptive {m, s,t)-graph, with s = that is 

admissible for every set S C [m] of size at most n. 

Proof of Lemma \6.tA Fix an admissible graph G. Thus, G satisfies (PI) and (P2) above. Fix a 
set S C [m] of size at most n. We will show that there is a 0-1 assignment to the memory such 
that all queries are answered correctly by Tg- 

Let S' C [m] be such that S V S' and IS^I = n. From (P2), we know |T5/| < |'2nlg^]. 
Hence, IS'UTg/l < n-|- |'2nlg . From (PI) and Hall’s theorem, we may assign to each element 
u G S' U Ts' a set C V such that (i) |A„| = and (ii) the A^j’s are disjoint. For each 
u G S V S', we assign the value 1 to all locations in A„. For each u G {S'dTs/)\S, we assign the 
value 0 to all locations in A„. Since all queries for u ^ S' UTgi are answered correctly. 

Assign 0 to all locations in rG([m]\(S'UT5/)). For y G [m]\(S'UT5'/), |rG(y)nrG(5')| < 

As a result, queries for elements in [m]\(S''UT5') are answered correctly, as the majority evaluates 
to 0 for each one of them. □ 


Proof of lemma In the following, set 


s = 


2 

60m *-i n 


1 - 


2 , 2m 

t-i ig- 

n 


We show that a suitable random non-adaptive (m, s, t)-graph G is admissible for sets of size at 
most n with positive probability. The graph G is constructed as follows. Recall that V = Ub. 
For each u ^ U, one neighbor is chosen uniformly and independently in each V). 

(PI) holds. If (PI) fails, then for some non-empty W VU, {\W\ < n -\- |'2nlg^]), we have 
|rG(VF)| < — 1. Fix a set W of size r > 1 and L C 1/ of size ^^r — 1. Let L have 

Ii elements in Vp, thus, ~ 1- Then, 

P.[r«(tr) c L] < n (ili)' < . 
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where the last inequality is a consequence of GM < AM. We conclude, using the union 
bound over choices of W and L, that (PI) fails with probability at most 



( 6 . 1 ) 


( 6 . 2 ) 


where the last inequality holds because we have chosen s large enough. 

(P2) holds. For (P2) to fail, there must exist a set S C [m] of size n such that \Ts\ > |'2relg . 
Fix a set S of size n. Fix a y E [m] \ S. 


Prb G Ts] < 




< 


n 


10m ’ 


where the last inequality holds because of choice of s and m is large. Thus, E[|T 5 |] < 

To conclude that is bounded with high probability, we will use the following version of 
Chernoff bound: il X = Xi, where each random variable Xi E {0,1} independently, 
then if 7 > 2eE[X], then Pr[X > 7 ] < 2~^. Then, for all large m, 

Pr[|T5| >2nlg^] <2-2-lg^. 

Using the union bound, we conclude that 


Pr[(P2) fails] 

1 

< -. 

“ 3 

Thus, with probability at least | the random graph G is admissible. 

□ 


7 General upper bound: adaptive (Theorem 11.71) 

In order to show that s(m, n, t) is small, we will exhibit efficient adaptive schemes to store sets 
of size exactly n. This will imply our bound (where we allow sets of size at most n) because 
we may pad the universe with n additional elements, and extend S (|S'| < n)by adding n — 151 
additional elements, to get a subset is of size exactly n in a universe of size m + n < 2m. 

Definition 7.1. An adaptive (m, s, f)-graph is a bipartite graph G with vertex sets U = [m] 
and V (|U| = ( 2 * —l)s). V is partitioned into 2 * —1 disjoint sets: A, Aq, Ai, Aqo,- • •, that is, one 
A(j for each a E {0,each A„ has s vertices. Between each u & U and each there is 
exactly one edge. Let V) := Uo-:|o-|=i-i^o-- An (m, s, t)-graph naturally gives rise to a systematic 
(m, (2* — l)s, t)-query scheme Tg as follows. We view the memory (an array L of (2* — l)s bits) 
to be indexed by vertices in V. For query element tt E U, if the first i — 1 probes resulted in 
values a E {0,then the i-th probe is made to the location indexed by the unique neighbor 
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of u in A(j. In particular, the i-th probe is made at a location in Vi. We answer “Yes” iff the last 
bit read is 1. We refer to Vt as the leaves of G and for y G [m], let leaves(y) := 14 H Tciy). For 
RC [m], let Ieaves(i2) := Vt n Tg{R). 

We say that the query scheme Tg is satisfiable for a set S C [m], if there is an assignment 
to the memory locations {L[v\ : v G V), such that Tg correctly answers all queries of the form 
“Is X in ST. 

We assume that t > 3 is odd and show that Ve > 0 Vn < Vt < Igigm s{m,n,t) = 

0(exp(e^*)m‘+in *+i Igm). Our t-probe scheme will have two parts: a ti-probe non-adaptive 
part and a t 2 -pi'obe adaptive part, such that ti +12 = t. The respective parts will be based on 
appropriate non-adaptive (m, s, ti)-graph Gi and adaptive (m, s, t 2 )-graph G 2 respectively. To 
decide set membership, we check set membership in the two parts separately and take the AND, 
that is, we answer “Yes” iff all bits read in 7 gi are 1 and the last bit read in 7 g 2 is 1- We refer 
to this scheme as Tgi A Tg 2 ■ 

First, we identify appropriate properties of the underlying graphs Gi and G 2 that guarantee 

that all queries are answered correctly for sets of size n. We then show that such graphs exist 

2 1 _^ 

with s = 0(exp(e^* — t)mwn ‘+1 Igm). 

We will use the following constants in our calculations: a := 2*^ — 1 and /3 := 2*^ — 12. Note 
that a is the total number of nodes in a t 2 -pi'obe adaptive decision tree. In any such decision 
tree, for every choice of j3 nodes and every choice b G {0,1} of the answer, it is possible to assign 
values to those /3 nodes so that the decision tree returns the answer b. 

Definition 7.2 (admissible-pair). We say that a non-adaptive (m, s, ti)-graph Gi and an adap¬ 
tive (m, s, t 2 )-gi'aph G 2 form an admissible pair (Gi,G 2 ) for sets of size n if the following 
conditions hold. 

(PI) MS C [m] (|S'| = n): |survivors(5)| < 10m (f)*\ where survivors(S) = {y ^ S : TGi{y) C 
FgiC^)}. 

(P2) For S C [m] (|5| = n), let survivors’^(5) = {y G survivors(S') : leavesG' 2 ('S') n leavesG 2 (y) T 
0}. Then, MS C [m] (|5| = n) VT C U survivors+(5): PcafF) > /3|T|. 

Lemma 7.3. If a non-adaptive {m, s,ti)-graph Gi and an adaptive {m, s,t 2 )-gf' 0 -ph G 2 form an 
admissible pair for sets of size n, then the query scheme Tgi A 7g2 ** satisfiable for every set 
S C [m] of size n. 

Lemma 7.4. Let t > 3 be an odd number; let ti = and t 2 = Then, there exist 

an admissible pair of graphs consisting of a non-adaptive {m, s,ti)-graph Gi and an adaptive 
{m, s,t 2 )-graph G 2 with s = 0(exp(e^* — t)m*^+'^n ‘+1 Igm). 

Proof of Lemma \7.,‘A Fix an admissible pair (^ 1 ,^ 2 ). Thus, Gi satisfies (PI) and G 2 satisfies 
(P2) above. Fix a set S C [m] of size n. We will show that there is an assignment such that 
Tgi a Tg 2 answers all questions of the form “Is x in ST correctly. 

The assignment is constructed as follows. Assign 1 to all locations in rG'^(S') and 0 to the 
remaining locations in rG'^(5). Thus, 7 gi answers “Yes” for all query elements in S and answers 
“No” for all query elements outside 5 U survivors(S'). However, it (incorrectly) answers “Yes” for 
elements in survivors(S'). We will now argue that these false positives can be eliminated using 
the scheme Tg 2 - 

Using (P2) and Hall’s theorem, we may assign to each element ri G S' U survivors’^(S') a 
set Lu C V{G 2 ) such that (i)|T„| = j3 and (ii) the T^’s are disjoint. Set = 1 for u G S' 
and bu = 0 for u G survivors^(S') (some of the false positives). As observed above for each 
n G S' U survivors’*"(S') we may set the values in the locations in such that the value returned 
on the query element u is precisely bu- Since the T^’s are disjoint we may take such an action 
independently for each u. After this partial assignment, it remains to ensure that queries for 
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elements y E survivors(S') \ survivors “*'(5') (the remaining false positives) return a “No”. Consider 
any such y. By the definition of survivors"*'(S'), no location in leavesG2(2/) been assigned a 
value in the above partial assignment. Now, assign 0 to all unassigned locations in V{G 2 )- Thus 
7 g 2 returns the answer “No” for queries from survivors(S) \ survivors"*'(S). □ 


Proof of Lemma\7.4\ In the following, let 


s = 


exp(e^* 


2 

t)m*+i n 


1 - 


2 

t+1 IgTTT, 


We will construct the non-adaptive (m, s, ti)-graph Gi and the (m, s, t 2 )-gi'aph G 2 randomly, and 
show that with positive probability the pair (Gi, G 2 ) is admissible. The graph Gi is constructed 
as in the proof of Lemma 16.41 and the analysis is similar. Recall that V{Gi) = UiG[ii] 

For each u & U, one neighbor is chosen uniformly and independently from each Vi{Gi). 


(PI) holds. Fix a set S of size n. Then, E[|survivors(S')|] < (m — n) (f)*^ < m As 

before, using the Chernoff bound, we conclude that 

Pr[|survivors(5)| > 10m 0)*'] < 

Then, by the union bound, 

Pr[Pl fails] < 

1 

- To’ 

where the last inequality follows from our choice of s. 

Fix a graph Gi such that (PI) holds. The random graph G 2 is constructed as follows. 
Recall that V{G 2 ) = Uzgio ^z- For each u E [m], one neighbor is chosen uniformly and 

independently from each A^. 

To establish (P2), we need to show that all sets of the form S' U R, where S' C S and 
R C survivors^(S) expand. To restrict the choices for R, we hrst show in Claim [731 (a) that 
with high probability survivors'^(5) is small. Then, using direct calculations, we show that whp 
the required expansion is available in the random graph G 2 . 

Claim 7.5. (a) Let £a = V5 C [m](|5| = n) : [survivors"*"(5)1 < 100 • (^)*^''~^; then, 

Pr[fa] > 

(b) Let £b = Vi? C [m] (|i?| < n + [nlgm]) : |rG' 2 (-^)l ^ /3|i?|; then, Pr[f6] > 

(c) Let Sc = C [m](|5| = n),ViS" C 5,Vi? C survivors"*"(iS')(|'nlgm] < |i?| < 100 • 

2 t 2 m (f)*^+^) : |rG2(5' U i?)| > P\S' U i?|; then, Pr[4] > 


Proof of claim 17.51 Part (a) follows by a routine application of Chernoff bound, as in 
several previous proofs. For a set S of size n, we have E [survivors’*" (S')] < |survivors(S)|2*2 (^) < 
2*210m Then, 


PrhSJ < 
< 


1 

To’ 


where the last inequality holds because of our choice of s. 


15 






Next consider part (b). If does not hold, then for some non-empty W C [m], {\W\ < 
n+ [nlgm]), we have \Tg 2 (W)\ ^ ~ 1- Fix a set W of size r > 1 and L C V{G 2 ) of size 

/?r — 1. Let L have (.z elements in Az- Then, 


Pr[rG,(IT) C L] < n 


1^ 


< 


fir — 1 


as 


We conclude, using the union bound over choices of W and L, that the probability that £i, does 
not hold is at most 


< 


< 


< 


< 


n+\n Ig m] 

E 

r=l 

n+\n Ig m] 

E 

r=l 

n+fn Ig m] 


m 


as 


r J \(ir — 1 


/em\ ^ 
V r 


aes 
fir — 1 


E ( — 

^ \ rvPfi 


r=l 

n+\n Ig m] 

E 

1 

To’ 


aes 

fir 

aes 


em 


p/3-l-l 


fir — 1 
as 

fSi —1 


fir 

as 


fir — 1 


as 


a—/3 




c«-/3 


where the last inequality holds because of our choice of s. 

Finally, we justify part (c). To bound the probability that £c fails, we consider a set S C [m] 
of size n, a subset 5" C S' of size i (say), a subset R C survivors'*’(S') of size r (where [nlgm] < 
r < 100 • 2^'^m and L C V{G 2 ) of size £ = fi{i + r) and define the event 


£{Sy S', R, L) = {'^yeR: leavesG2 (5) n leavescj (y) / 0) A rG2 {S' U i?) C L. 


Then, 

(7.1) 

(7.2) 

where the factor is justified because of the requirement that every y £ R has at least 

/ \(a-l)r 

one neighbour in leavesG2('S'); the factor ( (^a-i)s ) justified because all the remaining 

neighbours must lie in L (we use AM > GM); the last factor (^)°^* is justified because all 
neighbors of elements in S lie in L (again we use AM > GM). To complete the argument we 
apply the union bound over the choices of {S, S', R, L). Note that we may restrict attention 
to i = fi{i + r) (because for our choice of s, we have fi{i + r) < |y(G2)| = as). Thus, the 
probability that £c fails to hold is at most 


Pr[£:(5, 5', i?,L)] < 




{a — l)s 


(a—l)r 


as 


< 


( ft{i + r) 
{a — l)s 


(a—l)(i-|-r) 


/3(f + r] 


as 


Y, ^A£c{s,s',r,l)], 

S,S',R,L 


where S ranges over sets of size n, S' O S of size i, R ff survivors(5') of size r such that 
[nlgmj < r < 1002'^m (^) ^ , L is a subset of V^(G2) of size ft{i -|- r). We evaluate this sum 
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as follows. 


EE 


r 


n 


as 


2*2n\^ //3(i + r) 


ij\j3{i + r)j \ s J \(a —l)s 


(a—l)(i+r) 


I3{i + r)Y 


as 


(7.3) 


SEE 

r i 

eas 

/3(i + r) 

SEE 

r i 


ea 


em\i^ f lOem (f \ i+’’ f (3{i + r) ^ ^ 


n 7 




/3 /2t2 


n 


i+r 


^ lOem (f) 


/3(i + r) \ »+’' / 2*2 


a — 1 y \2^^n{a — 1) 


i j V (a — 1 )'S 

i+r 

p{i + r) 

{a — l)s 

*1 \ «+r /^\ — / _|_ 

r J Viy V(«-i)« 

-I i+r 


(7.4) 


OL — p — 1 


-n 


(7.5) 


We will show that the quantity inside the square brackets is at most Then, since r > nlgm 
and z > 0 

P-'HSJS (^E^-’') (E2 -‘)s 4 . 

The quantity in the brackets can be decomposed as a product of two factors, which we will 
bound separately. 

Factor 1: Consider the following contributions 


/em\ 1+7 , , _r_ 

(—) (10e)»+’- 


1 n l 

n\ i+’’ / ea \ f I3{i + r) ^ *+'■ 
zy \a —ly \2*2n(a —1) 


Since r > nlgm and z < n, we have Thus, for all large enough 

m, this quantity is at most 

• lOe • e^ • {2e)^ ■ e < exp(e^* — t). 

Factor 2: We next bound the contribution for the remaining factors. 

i(f )*i \ ^ f P{i + r)\ "“7-1 / 2*2^ 


m 


< 


r y v(a-l)'Sy \ s 

m(f)*i\ /2r\"“7-i /2i2^ 


r y V s y \ s 

j^j^Zl + l2"-/3+i2-l^Q:-/3-2 


gU-fl+tl 


(7.6) 

(7.7) 

(7.8) 


To justify (|7.7I) . Recall that r < 100-2*2m (^) * and s = exp(e^* — t)m*+in^ ‘+i Ig 


m 


thus 


> 1. Then, the above quantity is bounded by 

mn*i'''*2^72-i) (xQQ . 2*2,77,7^11+1^" 7 2 


< 


g(Zi+l)(a-^-2) ,gO-/3+Zi 

100 • 22*2mn*i+*\"“^“^ 


qtl +2 


(7.9) 

(7.10) 
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Thus, since s 
as required. 


„, 2^2 -| 
exp(e^* — Igm , then the product of the factors is at most jg, 


10 ’ 
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